73 research outputs found

    Training Knowledge Graph Embedding Models

    Get PDF
    Knowledge graph embedding (KGE) models have become popular means for making discoveries in knowledge graphs (e.g., RDF graphs) in an efficient and scalable manner. The key to success of these models is their ability to learn low-rank vector representations for knowledge graph entities and relations. Despite the rapid development of KGE models, state-of-the-art approaches have mostly focused on new ways to represent embeddings interaction functions (i.e., scoring functions). In this paper, we argue that the choice of other training components such as the loss function, hyperparameters and negative sampling strategies can also have substantial impact on the model efficiency. This area has been rather neglected by previous works so far and our contribution is towards closing this gap by a thorough analysis of possible choices of training loss functions, hyperparameters and negative sampling techniques. We finally investigate the effects of specific choices on the scalability and accuracy of knowledge graph embedding models.Knowledge graph embedding (KGE) models have become popular means for making discoveries in knowledge graphs (e.g., RDF graphs) in an efficient and scalable manner. The key to success of these models is their ability to learn low-rank vector representations for knowledge graph entities and relations. Despite the rapid development of KGE models, state-of-the-art approaches have mostly focused on new ways to represent embeddings interaction functions (i.e., scoring functions). In this paper, we argue that the choice of other training components such as the loss function, hyperparameters and negative sampling strategies can also have substantial impact on the model efficiency. This area has been rather neglected by previous works so far and our contribution is towards closing this gap by a thorough analysis of possible choices of training loss functions, hyperparameters and negative sampling techniques. We finally investigate the effects of specific choices on the scalability and accuracy of knowledge graph embedding models

    Embedding Cardinality Constraints in Neural Link Predictors

    Get PDF
    Neural link predictors learn distributed representations of entities and relations in a knowledge graph. They are remarkably powerful in the link prediction and knowledge base completion tasks, mainly due to the learned representations that capture important statistical dependencies in the data. Recent works in the area have focused on either designing new scoring functions or incorporating extra information into the learning process to improve the representations. Yet the representations are mostly learned from the observed links between entities, ignoring commonsense or schema knowledge associated with the relations in the graph. A fundamental aspect of the topology of relational data is the cardinality information, which bounds the number of predictions given for a relation between a minimum and maximum frequency. In this paper, we propose a new regularisation approach to incorporate relation cardinality constraints to any existing neural link predictor without affecting their efficiency or scalability. Our regularisation term aims to impose boundaries on the number of predictions with high probability, thus, structuring the embeddings space to respect commonsense cardinality assumptions resulting in better representations. Experimental results on Freebase, WordNet and YAGO show that, given suitable prior knowledge, the proposed method positively impacts the predictive accuracy of downstream link prediction tasks.Comment: 8 pages, accepted at the 34th ACM/SIGAPP Symposium on Applied Computing (SAC '19

    Capacitación, mantenimiento y arreglo de máquinas despulpadoras de Café (Coffea arábica) a 100 caficultores del municipio de Taminango- Nariño.

    Get PDF
    En Colombia, y por lo general en el departamento de Nariño incluido el municipio de Taminango, existen unos caficultores muy dedicados a las labores de campo, desafortunadamente a lo largo de estos últimos años han descuidado una práctica muy importante que está afectando directamente la calidad y los precios de su producto; como lo es la falta de mantenimiento y el arreglo de su máquina despulpadora de café que es un equipo vital, específicamente en el despulpado ya que si no se realiza adecuadamente se producen defectos como el grano mordido y trillado que afecta notablemente la calidad y precio del café (Coffea arábica). El presente trabajo se desarrolló con el objetivo principal de capacitar a los caficultores del municipio de Taminango en mantener su despulpadora en excelentes condiciones para así poder desarrollar la actividad de despulpado en adecuadas condiciones, asegurando un grano de buena calidad; para esto se contó con la participación directa de la federación nacional de cafeteros, comité departamental de Nariño y con el respectivo servicio de extensión de este municipio. Gracias a la vinculación de esta institución, los cafeteros beneficiarios del proyecto y a los estudiantes de la universidad nacional abierta y a distancia, se logró capacitar a 100 caficultores en los temas beneficio húmedo y seco del café (Coffea arábica), identificación y manejo de los defectos del café (Coffea arábica), mantenimiento y reparación de máquinas despulpadoras de café (Coffea arábica). Así mismo se logró reparar 100 máquinas despulpadoras, realizando labores de pintura y latonería, cambio de camisa en material de cobre, cambio de balineras y calibración, asegurando que sus equipos estén en buenas condiciones para procesar su cosecha y obtener un grano de buena calidad.In Colombia, and usually in the department of Nariño including the municipality of Taminango, there are some coffee growers very dedicated to the field work, unfortunately throughout these last years they have neglected a very important practice that is directly affecting the quality and the prices of your product; such as the lack of maintenance and the arrangement of your coffee pulping machine, which is a vital equipment, specifically in pulping since if it is not done properly, defects such as the bitten and threshed grain occur that significantly affect the quality and price of the coffee (Coffea arábica). The present work was developed with the main objective of training the coffee (Coffea arábica) growers of the municipality of Taminango in keeping their pulper in excellent conditions in order to develop the pulping activity in adequate conditions, ensuring a good quality grain; for this, there was the direct participation of the national coffee federation, departmental committee of Nariño and the respective extension service of this municipality Thanks to the linkage of this institution, the coffee beneficiaries of the project and the students of the national open and distance university, 100 coffee farmers were trained in the topics of wet and dry coffee (Coffea arábica) benefits, identification and management of coffee (Coffea arábica) defects, maintenance and repair of coffee pulping machines. Likewise, 100 pulping machines were repaired, performing painting and brasswork, changing the shirt in copper material, changing the bearings and calibration, ensuring that their equipment is in good condition to process their harvest and obtain a good quality grain

    Accurate prediction of kinase-substrate networks using knowledge graphs

    Get PDF
    Phosphorylation of specific substrates by protein kinases is a key control mechanism for vital cell-fate decisions and other cellular processes. However, discovering specific kinase-substrate relationships is time-consuming and often rather serendipitous. Computational predictions alleviate these challenges, but the current approaches suffer from limitations like restricted kinome coverage and inaccuracy. They also typically utilise only local features without reflecting broader interaction context. To address these limitations, we have developed an alternative predictive model. It uses statistical relational learning on top of phosphorylation networks interpreted as knowledge graphs, a simple yet robust model for representing networked knowledge. Compared to a representative selection of six existing systems, our model has the highest kinome coverage and produces biologically valid high-confidence predictions not possible with the other tools. Specifically, we have experimentally validated predictions of previously unknown phosphorylations by the LATS1, AKT1, PKA and MST2 kinases in human. Thus, our tool is useful for focusing phosphoproteomic experiments, and facilitates the discovery of new phosphorylation reactions. Our model can be accessed publicly via an easy-to-use web interface (LinkPhinder).Phosphorylation of specific substrates by protein kinases is a key control mechanism for vital cell-fate decisions and other cellular processes. However, discovering specific kinase-substrate relationships is time-consuming and often rather serendipitous. Computational predictions alleviate these challenges, but the current approaches suffer from limitations like restricted kinome coverage and inaccuracy. They also typically utilise only local features without reflecting broader interaction context. To address these limitations, we have developed an alternative predictive model. It uses statistical relational learning on top of phosphorylation networks interpreted as knowledge graphs, a simple yet robust model for representing networked knowledge. Compared to a representative selection of six existing systems, our model has the highest kinome coverage and produces biologically valid high-confidence predictions not possible with the other tools. Specifically, we have experimentally validated predictions of previously unknown phosphorylations by the LATS1, AKT1, PKA and MST2 kinases in human. Thus, our tool is useful for focusing phosphoproteomic experiments, and facilitates the discovery of new phosphorylation reactions. Our model can be accessed publicly via an easy-to-use web interface (LinkPhinder)

    Mortality from gastrointestinal congenital anomalies at 264 hospitals in 74 low-income, middle-income, and high-income countries: a multicentre, international, prospective cohort study

    Get PDF
    Summary Background Congenital anomalies are the fifth leading cause of mortality in children younger than 5 years globally. Many gastrointestinal congenital anomalies are fatal without timely access to neonatal surgical care, but few studies have been done on these conditions in low-income and middle-income countries (LMICs). We compared outcomes of the seven most common gastrointestinal congenital anomalies in low-income, middle-income, and high-income countries globally, and identified factors associated with mortality. Methods We did a multicentre, international prospective cohort study of patients younger than 16 years, presenting to hospital for the first time with oesophageal atresia, congenital diaphragmatic hernia, intestinal atresia, gastroschisis, exomphalos, anorectal malformation, and Hirschsprung’s disease. Recruitment was of consecutive patients for a minimum of 1 month between October, 2018, and April, 2019. We collected data on patient demographics, clinical status, interventions, and outcomes using the REDCap platform. Patients were followed up for 30 days after primary intervention, or 30 days after admission if they did not receive an intervention. The primary outcome was all-cause, in-hospital mortality for all conditions combined and each condition individually, stratified by country income status. We did a complete case analysis. Findings We included 3849 patients with 3975 study conditions (560 with oesophageal atresia, 448 with congenital diaphragmatic hernia, 681 with intestinal atresia, 453 with gastroschisis, 325 with exomphalos, 991 with anorectal malformation, and 517 with Hirschsprung’s disease) from 264 hospitals (89 in high-income countries, 166 in middleincome countries, and nine in low-income countries) in 74 countries. Of the 3849 patients, 2231 (58·0%) were male. Median gestational age at birth was 38 weeks (IQR 36–39) and median bodyweight at presentation was 2·8 kg (2·3–3·3). Mortality among all patients was 37 (39·8%) of 93 in low-income countries, 583 (20·4%) of 2860 in middle-income countries, and 50 (5·6%) of 896 in high-income countries (p<0·0001 between all country income groups). Gastroschisis had the greatest difference in mortality between country income strata (nine [90·0%] of ten in lowincome countries, 97 [31·9%] of 304 in middle-income countries, and two [1·4%] of 139 in high-income countries; p≤0·0001 between all country income groups). Factors significantly associated with higher mortality for all patients combined included country income status (low-income vs high-income countries, risk ratio 2·78 [95% CI 1·88–4·11], p<0·0001; middle-income vs high-income countries, 2·11 [1·59–2·79], p<0·0001), sepsis at presentation (1·20 [1·04–1·40], p=0·016), higher American Society of Anesthesiologists (ASA) score at primary intervention (ASA 4–5 vs ASA 1–2, 1·82 [1·40–2·35], p<0·0001; ASA 3 vs ASA 1–2, 1·58, [1·30–1·92], p<0·0001]), surgical safety checklist not used (1·39 [1·02–1·90], p=0·035), and ventilation or parenteral nutrition unavailable when needed (ventilation 1·96, [1·41–2·71], p=0·0001; parenteral nutrition 1·35, [1·05–1·74], p=0·018). Administration of parenteral nutrition (0·61, [0·47–0·79], p=0·0002) and use of a peripherally inserted central catheter (0·65 [0·50–0·86], p=0·0024) or percutaneous central line (0·69 [0·48–1·00], p=0·049) were associated with lower mortality. Interpretation Unacceptable differences in mortality exist for gastrointestinal congenital anomalies between lowincome, middle-income, and high-income countries. Improving access to quality neonatal surgical care in LMICs will be vital to achieve Sustainable Development Goal 3.2 of ending preventable deaths in neonates and children younger than 5 years by 2030

    Reducing the environmental impact of surgery on a global scale: systematic review and co-prioritization with healthcare workers in 132 countries

    Get PDF
    Abstract Background Healthcare cannot achieve net-zero carbon without addressing operating theatres. The aim of this study was to prioritize feasible interventions to reduce the environmental impact of operating theatres. Methods This study adopted a four-phase Delphi consensus co-prioritization methodology. In phase 1, a systematic review of published interventions and global consultation of perioperative healthcare professionals were used to longlist interventions. In phase 2, iterative thematic analysis consolidated comparable interventions into a shortlist. In phase 3, the shortlist was co-prioritized based on patient and clinician views on acceptability, feasibility, and safety. In phase 4, ranked lists of interventions were presented by their relevance to high-income countries and low–middle-income countries. Results In phase 1, 43 interventions were identified, which had low uptake in practice according to 3042 professionals globally. In phase 2, a shortlist of 15 intervention domains was generated. In phase 3, interventions were deemed acceptable for more than 90 per cent of patients except for reducing general anaesthesia (84 per cent) and re-sterilization of ‘single-use’ consumables (86 per cent). In phase 4, the top three shortlisted interventions for high-income countries were: introducing recycling; reducing use of anaesthetic gases; and appropriate clinical waste processing. In phase 4, the top three shortlisted interventions for low–middle-income countries were: introducing reusable surgical devices; reducing use of consumables; and reducing the use of general anaesthesia. Conclusion This is a step toward environmentally sustainable operating environments with actionable interventions applicable to both high– and low–middle–income countries

    Knowledge graph mining with latent shape graphs

    Get PDF
    Knowledge graphs are graph-structured knowledge bases that have shown to be of great value in many Artificial Intelligence applications in academia and industry alike. They are typically generated automatically from un-/semi- structured data sources. The increasing popularity of knowledge graphs has been limited by multiple challenges given the size and quality of the information they contain. This thesis explores the relationship between the quality of knowledge graphs and machine learning technologies used to discover and extract knowledge from them. We focus on quality in terms of completeness and consistency. Knowledge graphs provide the flexibility required for representing knowledge at different scales in open environments such as the Web. However, their versatility makes them have an ever-changing schema, which also makes them hard to summarize and understand their content. Moreover, they are typically never complete—even in very specific domains—and their consistency with respect to a given schema or ontology cannot be guaranteed without the corresponding validation. That lack of an accurate schema has shown to be problematic in use cases where applications might need to rely on the fact that data satisfy a set of constraints. The contribution of this thesis is twofold. Firstly, we propose a scalable data-driven method to exhibit the actual (latent) shape of graph data. We introduce an algorithm for mining relation cardinality bounds and building so-called shapes that exhibit important aspects of the structure (or topological information) of entities and relations in a knowledge graph. Latent shapes also allow us to formalise an approximate algorithm for validating the structure of knowledge graphs. Secondly, we exploit the latent shapes of entities and relations to enhance the performance of machine learning models aimed to predict missing links and complete knowledge graphs. We use local patterns information and graph-based feature models in the Bioinformatics domain for improving the prediction of adverse drug reactions achieving new state-of-the-art results. Finally, we extend latent feature models by encoding the cardinality of relations as a regularisation term used to learn semantic embeddings that improve the precision of downstream prediction tasks in benchmark datasets

    Using linked data to mine RDF from wikipedia's tables

    Get PDF
    The tables embedded in Wikipedia articles contain rich, semi-structured encyclopaedic content. However, the cumulative content of these tables cannot be queried against. We thus propose methods to recover the semantics of Wikipedia tables and, in particular, to extract facts from them in the form of RDF triples. Our core method uses an existing Linked Data knowledge-base to find pre-existing relations between entities in Wikipedia tables, suggesting the same relations as holding for other entities in analogous columns on different rows. We find that such an approach extracts RDF triples from Wikipedia's tables at a raw precision of 40%. To improve the raw precision, we define a set of features for extracted triples that are tracked during the extraction phase. Using a manually labelled gold standard, we then test a variety of machine learning methods for classifying correct/incorrect triples. One such method extracts 7.9 million unique and novel RDF triples from over one million Wikipedia tables at an estimated precision of 81.5%.This work was supported in part by Fujitsu (Ireland) Ltd., by the Millennium Nucleus Center for Semantic Web Research under Grant NC120004, and by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289peer-reviewe

    On learnability of constraints from RDF data

    Get PDF
    RDF is structured, dynamic, and schemaless data, which enables a big deal of flexibility for Linked Data to be available in an open environment such as the Web. However, for RDF data, flexibility turns out to be the source of many data quality and knowledge representation issues. Tasks such as assessing data quality in RDF require a different set of techniques and tools compared to other data models. Furthermore, since the use of existing schema, ontology and constraint languages is not mandatory, there is always room for misunderstanding the structure of the data. Neglecting this problem can represent a threat to the widespread use and adoption of RDF and Linked Data. Users should be able to learn the characteristics of RDF data in order to determine its fitness for a given use case, for example. For that purpose, in this doctoral research, we propose the use of constraints to inform users about characteristics that RDF data naturally exhibits, in cases where ontologies (or any other form of explicitly given constraints or schemata) are not present or not expressive enough. We aim to address the problems of defining and discovering classes of constraints to help users in data analysis and assessment of RDF and Linked Data quality.TOMOE project funded by Fujitsu Laboratories Limited and Insight Centre for Data Analytics at NUI Galwa

    Knowledge graph mining with latent shape graphs

    No full text
    Knowledge graphs are graph-structured knowledge bases that have shown to be of great value in many Artificial Intelligence applications in academia and industry alike. They are typically generated automatically from un-/semi- structured data sources. The increasing popularity of knowledge graphs has been limited by multiple challenges given the size and quality of the information they contain. This thesis explores the relationship between the quality of knowledge graphs and machine learning technologies used to discover and extract knowledge from them. We focus on quality in terms of completeness and consistency. Knowledge graphs provide the flexibility required for representing knowledge at different scales in open environments such as the Web. However, their versatility makes them have an ever-changing schema, which also makes them hard to summarize and understand their content. Moreover, they are typically never complete—even in very specific domains—and their consistency with respect to a given schema or ontology cannot be guaranteed without the corresponding validation. That lack of an accurate schema has shown to be problematic in use cases where applications might need to rely on the fact that data satisfy a set of constraints. The contribution of this thesis is twofold. Firstly, we propose a scalable data-driven method to exhibit the actual (latent) shape of graph data. We introduce an algorithm for mining relation cardinality bounds and building so-called shapes that exhibit important aspects of the structure (or topological information) of entities and relations in a knowledge graph. Latent shapes also allow us to formalise an approximate algorithm for validating the structure of knowledge graphs. Secondly, we exploit the latent shapes of entities and relations to enhance the performance of machine learning models aimed to predict missing links and complete knowledge graphs. We use local patterns information and graph-based feature models in the Bioinformatics domain for improving the prediction of adverse drug reactions achieving new state-of-the-art results. Finally, we extend latent feature models by encoding the cardinality of relations as a regularisation term used to learn semantic embeddings that improve the precision of downstream prediction tasks in benchmark datasets
    corecore